System and method for continual decoding of brain states to multi-degree-of-freedom control signals in hands free devices

ABSTRACT

A brain-machine interface system configured to decode neural signals to control a target device includes a sensor to sample the neural signals, and a computer-readable storage medium having software instructions, which, when executed by a processor, cause the processor to transform the neural signals into a common representational space stored in the system, provide the common representational space as a state representation to inform an Actor recurrent neural network policy of the system, generate and evaluate, utilizing a deep recurrent neural network of the system having a generative sequence decoder, predictive sequences of control signals, supply a control signal to the target device to achieve an output of the target device, determine an intrinsic biometric-based reward signal, from the common representational space, based on an expectation of the output of the target device, and supply the intrinsic biometric-based reward signal to a Critic model of the system.

CROSS-REFERENCE TO RELATED APPLICATION(S)

The present application claims priority to and the benefit of U.S.Provisional Application No. 62/869,867, filed Jul. 2, 2019, the entirecontents of which are incorporated herein by reference.

BACKGROUND 1. Field

The present disclosure relates generally to systems and methods fordecoding neural signals into control signals for target devices.

2. Description of Related Art

Brain-Machine Interface (BMI) systems may be utilized to decode a user'sneural signals into control signals. In some related art BMI systemsthat do not explicitly follow a stereotypical stimulus-responseparadigm, decoders utilize the endogenous “event related potential”(ERP) of the user's brain as the control signal. ERP are involuntaryelectrical brain signals that can be detected transcranially, usually ina 300-millisecond or 400-millisecond window of time, and are typicallylimited to a single binary signal. In these scenarios, the controlsignal capacity increases but at the cost of lower accuracy. A moregeneral electroencephalogram (EEG) BMI is known as biofeedback, wherethe user learns to control some aspect of EEG, such as the level ofalpha waves. Typically, an external computer produces an auditory orvisual indication of the EEG aspect the user wants to learn to control,and that indication helps the user to learn the appropriate brain state.Subsequently, it would be possible for the user to control some aspectof a machine or device by producing that brain state. However, in thisrelated art system, the control signal has one degree of freedom, hassuper high latency, is very inaccurate, and completely dependent on theuser learning.

SUMMARY

The present disclosure relates to various embodiments of a brain-machineinterface system configured to decode neural signals to control a targetdevice. In one embodiment, the brain-machine interface system includesat least one sensor configured to sample the neural signals, and acomputer-readable storage medium having software instructions storedtherein, which, when executed by a processor, cause the processor totransform the neural signals into a common representational space storedin the brain-machine interface system, provide the commonrepresentational space as a state representation to inform an Actorrecurrent neural network policy of the brain-machine interface system,generate and evaluate, utilizing a deep recurrent neural network of thebrain-machine interface system having a generative sequence decoder,predictive sequences of control signals for the target device, supply aspecific control signal derived from the predictive sequences of controlsignals to the target device to achieve an output of the target device,determine an intrinsic biometric-based reward signal, from the commonrepresentational space, based on an expectation of the output of thetarget device, and supply the intrinsic biometric-based reward signal toa Critic model of the brain-machine interface system.

The at least one sensor may be configured to sample the neural signalsinvasively or non-invasively.

The at least one sensor may include an invasive electrocorticographic(ECoG) device or an intracranial electroencephalography (iEEG) device.

Transforming the neural signals into the common representational spacemay include identifying regions of the neural signals with informativeactivations for controlling the target device, and performingsubject-specific transforms to align the regions across different users.

The intrinsic biometric-based reward may be a positive emotionalresponse when the output of the target device matches a user's intendedoutput.

The intrinsic biometric-based reward may be a negative emotionalresponse when the output of the target device does not match a user'sintended output.

The software instructions, when executed by the processor, may cause theprocessor to generate and evaluate the predictive sequences of thecontrol signals utilizing a tree search.

The present disclosure is also directed to various embodiments of anon-transitory computer-readable storage medium. In one embodiment, thenon-transitory computer-readable storage medium has softwareinstructions stored therein, which, when executed by a processor, causethe processor to transform neural data from an individual user into acommon representational space of a brain-machine interface system,provide the common representational space as a state representation toinform an Actor recurrent neural network policy of the brain-machineinterface system, generate and evaluate, utilizing a deep recurrentneural network of the brain-machine interface system having a generativesequence decoder, predictive sequences of control signals for a targetdevice, supply a specific control signal derived from the predictivesequences of control signals to the target device to produce an outputof the target device, determine an intrinsic biometric-based reward,from the common representational space, based on the individual user'sexpectation of the output of the target device, and supply the intrinsicbiometric-based reward to a Critic model of the brain-machine interfacesystem.

The instructions, when executed by a processor, may cause the processorto transform the neural data into the common representational space byidentifying regions of the neural data with informative activations forcontrolling the target device, and performing subject-specifictransforms to align the regions across different users.

The instructions, when executed by a processor, may cause the processorto determine the intrinsic biometric-based reward by decoding anemotional response from the individual user to the output of the targetdevice.

The intrinsic biometric-based reward may be a positive emotionalresponse when the output of the target device matches the individualuser's intended output.

The intrinsic biometric-based reward may be a negative emotionalresponse when the output of the target device does not match theindividual user's intended output.

The software instructions, when executed by the processor, may cause theprocessor to generate and evaluate the predictive sequences of thecontrol signals utilizing a tree search.

The present disclosure is also directed to various methods ofcontrolling a target device utilizing neural data. In one embodiment,the method includes sampling the neural data from a user, transformingthe neural data into a common representational space of a brain-machineinterface system, supplying the common representational space as a staterepresentation to inform an Actor recurrent neural network policy of thebrain-machine interface system, generating and evaluating, utilizing adeep recurrent neural network of the brain-machine interface systemhaving a generative sequence decoder, predictive sequences of controlsignals for the target device, supplying a specific control signalderived from the predictive sequences of control signals to the targetdevice to produce an output of the target device, determining anintrinsic reward, from the common representational space, based on theuser's expectation of the output of the target device, and supplying theintrinsic reward to a Critic model of the brain-machine interfacesystem.

Evaluating the predictive sequences of the control signals may include atree search.

Transforming the neural data into the common representational space mayinclude identifying regions of the neural data with informativeactivations for controlling the target device, and performingsubject-specific transforms to align the regions across different users.

Determining the intrinsic reward may include decoding an emotionalresponse from the user to the output of the target device.

The emotional response may be a positive emotional response when theoutput of the target device is expected.

The emotional response may be a negative emotional response when theoutput of the target device is unexpected.

Sampling the neural data from the user is performed invasively ornon-invasively.

This summary is provided to introduce a selection of concepts that arefurther described below in the detailed description. This summary is notintended to identify key or essential features of the claimed subjectmatter, nor is it intended to be used in limiting the scope of theclaimed subject matter.

BRIEF DESCRIPTION OF THE DRAWINGS

The patent or application file contains at least one drawing executed incolor. Copies of this patent or patent application publication withcolor drawing(s) will be provided by the Office upon request and paymentof the necessary fee.

The features and advantages of embodiments of the present disclosurewill be better understood by reference to the following detaileddescription when considered in conjunction with the accompanyingfigures. In the figures, like reference numerals are used throughout thefigures to reference like features and components. The figures are notnecessarily drawn to scale.

FIG. 1 is a brain-machine interface (BMI) system for decoding neuralsignals into control signals for a target device according to oneembodiment of the present disclosure;

FIG. 2 is a flowchart illustrating tasks of a method for decoding neuralsignals into control signals for a target device according to oneembodiment of the present disclosure;

FIG. 3 is a graph comparing the performance of the systems and methodsof the present disclosure to related art models (linear discriminantanalysis (LDA) and hidden Markov model (HMM));

FIG. 4 is a block diagram illustrating various computerized systemscommunicating with one another which may be used to implementembodiments of the present invention; and

FIG. 5 is a block diagram illustrating a processing system, a processingcircuit, or a portion of a processing system or processing circuit usedin conjunction with at least one embodiment of the present invention.

DETAILED DESCRIPTION

The present disclosure is directed to various embodiments ofbrain-machine interface (BMI) systems and methods for determiningcontrol signals for controlling the movements of an external hands-freedevice, such as a six degree-of-freedom (DOF) robot, from sensed dynamicproperties of the brain of a user (e.g., blood flow or electricalemanations of the user's brain). These control signals may be, forinstance, steering commands to an unmanned aerial vehicle or controlsignals to a robot arm. According to various embodiments of the presentdisclosure, the neural signals of the user may be sensed invasively(e.g., from sensors surgically implanted inside the brain) ornon-invasively (e.g., by sensors placed outside the user's skull).

FIG. 1 illustrates a brain-machine interface (BMI) system 100 accordingto one embodiment of the present disclosure that is configured to decodeneural data 200 from users into control signals for controlling a targetdevice (e.g., an external device) 300, such as a remote hands-freedevice (e.g., a robot or an unmanned aerial vehicle). FIG. 2 is aflowchart illustrating tasks of a method 400 of decoding the neural data200 into control signals for controlling the target device 300 accordingto one embodiment of the present disclosure utilizing the BMI system 100illustrated in FIG. 1.

In the illustrated embodiment, the method 400 includes a task 405 ofsampling the neural data 200 from a user. The neural data 200 sampled intask 405 may be sensed invasively (e.g., from sensors surgicallyimplanted inside the brain of the user) or non-invasively (e.g., bysensors placed outside the user's skull). In one or more embodiments,the task 405 of sampling the neural data 200 may include sampling theneural data 200 from the user while the user is performing a series oftasks with varying cognitive loads. In one or more embodiments, the task405 of sampling the neural data 200 of the user may be performedutilizing an invasive electrocorticographic (ECoG) device or anintracranial electroencephalography (iEEG) device (e.g., Dorsal M1 andventral sensorimotor (M1+S1) electrodes) while the user performs afinger flexion task.

In the illustrated embodiment, the method 400 also includes a task 410of transforming the neural data 200 sampled in task 405 into a commonrepresentational space 101 (i.e., common-space neural data) stored inthe BMI system 100, as shown in FIG. 1. The task 410 of transforming theneural data 200 into the common representational space 101 may beperformed in any suitable manner, such as the method described in VanUden C E, Nastase S A, Connolly A C, Feilong M, Hansen I, Gobbini M I,et al., “Modeling semantic encoding in a common neural representationalspace.” bioRxiv. Cold Spring Harbor Laboratory; 2018:288605, the entirecontent of which is incorporated herein by reference. In one or moreembodiments, the transformation into this common representational space101 may be learned during initial calibration through standardoptimization processes or more simply through canonical-correlationanalysis. The common representational space 101 enables sharing ofneural data samples from the same user across different sessions, andamong different users, without sacrificing individuality (i.e., theutilization of a functionally-derived shared model space across usersenables for pooling across users while maintaining the specificity ofeach user's model). In one or more embodiments, the task 410 oftransforming the neural data 200 into the common representational space101 includes identifying regions of the sampled neural data 200 with themost predictable or consistent activations as control signals forcontrolling the target device 300, and calculating subject-specifictransforms to align these regions across different users. For example,in one or more embodiments, the task 410 of transforming the neural data200 into the common representational space 101 includes calculatingtransformations between the neural data 200 of different users and/orthe neural data 200 of the same user at different times.

In the illustrated embodiment, the method 400 also includes a task 415of utilizing the common-space neural data 101 as a state representationto inform an Actor recurrent neural network (RNN) policy 102 of the BMIsystem 100 on what control signals to send to the target device 300(e.g., an unmanned aerial vehicle or a robot arm). Task 415 may beperformed utilizing a long short-term memory (LSTM) RNN architecture,such as that described in Hochreiter, S., & Schmidhuber, J. (1997).“Long short-term memory.” Neural computation, 9(8), 1735-1780, theentire content of which is incorporated herein by reference.

In the illustrated embodiment, the method 400 also includes a task 420of predicting, utilizing a deep RNN 103 of the BMI system 100illustrated in FIG. 1, a control signal for the target device 300 as themental state of the user is forming (i.e., predicting the control signalto send to the target device 300 based on the neural data in the commonrepresentational space 101 as the mental state of the user isdeveloping). Utilizing the deep RNN 103 enables decodingmulti-degree-of-freedom (multi-DOF) control signals from temporalcascades of neural activity, which provides the ability to incorporatediverse information across time and space into the control signal. Theunderlying learning algorithm utilized in task 420 is a generativesequence to sequence decoder capable of sequence decoding by leveragingRNN cells for the generator and discriminator components, using a treesearch method on the action space, and propagating the policy rewardgradient after full-sequence decoding (i.e., during the task 420 ofpredicting the control signals, the sequence decoder evaluates theproposed actions in a Monte-Carlo-Tree-Search like fashion). Generativesequence decoders are described in Yu L, Zhang W, Wang J, Yu Y. “SeqGAN:Sequence Generative Adversarial Nets with Policy Gradient.” 2016. pp.1-11. Available from: https://arxiv.org/abs/1609.05473, the entirecontent of which is incorporated herein. Monte-Carlo-Tree-Searches aredescribed in Kocsis, L., & Szepesvári, C. (2006, September). “Banditbased monte-carlo planning.” In European conference on machine learning(pp. 282-293). Springer, Berlin, Heidelberg, the entire content of whichis incorporated herein. The implicit predictions of the tree search maybe utilized to expedite control selection and thereby reduce latency fordecoder classification. In this manner, the task 420 is configured togenerate predictive sequences of control signals to evaluate multiplepotential outcomes simultaneously. Thus, the task 420 of utilizing thedeep RNN 103 is configured to improve the speed and accuracy of the BMIsystem 100 to control the target device 300 using the user's neural data200.

In the illustrated embodiment, the method 400 also includes a task 425of controlling the target device 300 (e.g., an external hands-freedevice such as an unmanned aerial vehicle or a robot arm) based on theneural data 200 of the user.

In the illustrated embodiment, the method 400 also includes a task 430of providing an online reward from an online reinforcement learningagent 104 to a Critic model 105 of the BMI system 100, as illustrated inFIG. 1. In one or more embodiments, the task 430 includes extracting anintrinsic biometric-based reward signal from the user's neural data inthe common representational space 101 based on the user's expectationsof a successful control of the target device 300. The task 430 ofextracting an intrinsic biometric-based reward signal from the user'sneural data may be performed by any suitable technique, such as thatdescribed in Zhao Y, Hessburg J P, Kumar J N A, Francis J T T. “ParadigmShift in Sensorimotor Control Research and Brain Machine InterfaceControl: The Influence of Context on Sensorimotor representations.”bioRxiv. Cold Spring Harbor Laboratory; 2018:239814, the entire contentof which is incorporated herein by reference. A suitable Critic model isdescribed Haarnoja, T., Zhou, A., Abbeel, P., & Levine, S. (2018). “Softactor-critic: Off-policy maximum entropy deep reinforcement learningwith a stochastic actor.” arXiv preprint arXiv:1801.01290, the entirecontent of which is incorporated herein by reference. In one or moreembodiments, the task 430 includes decoding the emotional response of auser to the intended action of the target device 300. For instance, theunsuccessful translation of the neural data 200 into a control signalwill result in the poor performance of the target device 300, and willthus elicit a negative emotional response from the user. Conversely, asuccessful translation of the neural data 200 into a control signal willresult in the desired performance of the target device 300, and willthus elicit a positive emotional response from the user. In one or moreembodiments, the task 430 includes transforming these positive ornegative emotional response(s) into a reward signal(s) and supplying thereward signal(s) to the Critic model 105 to drive online learning (e.g.,the task 430 includes extracting an intrinsic reward signal from thecommon representational space 101 of the neural data 200 and informingthe Critic model 105 based on how successfully the intended action wasperformed by the target device 300). In this manner, the task 430enables continual or substantially continual adaptation of the mappingfrom the neural data 200 to the output control signals for controllingthe target device 300 (i.e., the online reward is configured to enableadaptation to changes in the neural data 200).

In one or more embodiments, the BMI system 100 may include the commonrepresentational space 101 of the users' neural data, the Actor RNNpolicy 102, the deep RNN 103, the online reinforcement learning agent104, and the Critic model 105. In one or more embodiments, the BMIsystem 100 may also include the target device 300. In one or moreembodiments, the target device 300 may not be part of the BMI system100.

Reduction to Practice

A pilot study on the efficacy of using generative inputs to increaseaccuracy and responsiveness of EEG decoding was performed as proof ofconcept of the sequence decoder network in the BMI system 100. To thisend, decoder latency and performance characteristics of the generatorand discriminator long short-term memory (LSTM) networks were evaluated.Invasive electrocorticographic (ECoG) recordings were utilized from afinger flexion task. High gamma power from Dorsal M1 and ventralsensorimotor (M1+S1) electrodes on a 150 millisecond (ms) window with a50 ms slide was utilized as the representative feature. The LSTM modelwas initialized with a sequence autoencoder and then trained on thepower-electrode features. A non-stacked LSTM was utilized with 100hidden units and the baseline comparative models were lineardiscriminant analysis (LDA) such that the temporal sequence wasflattened and provided as the feature and a hidden Markov model (HMM).While LDA provides a strong baseline performance for structured tasksdue to it explicitly modeling the temporal response as a function ofneural activity relative to movement onset, the HMM and LSTM learn amore general representation by capturing the temporal dynamics. As such,as illustrated in FIG. 3, the performance of the LSTM is shown toprovide more discriminating information prior to related art LDA and HMMwhile also achieving better classification, especially when utilizing agenerative adversarial network (GAN) without temporal relationship toaugment the data to 10× the original observed data. In FIG. 3, the 10×refers to the amount of data augmentation using the generative model(i.e., ‘LSTM 10×’ was trained using 10× the data of ‘Observed Data OnlyLSTM’). As shown in FIG. 3 with the dashed arrow, source dataaugmentation demonstrates improved sequence model (LSTM) latencycompared to average of conventional models (LSTM, LDA and HMM) ofapproximately (about) 400 ms at 50% threshold accuracy. Accordingly, thesequence decoder of the BMI system 100 of the present disclosureexhibited improved latency and accuracy compared to related art systemsand methods.

Some or all of the operations described herein (e.g., the tasks 405-430depicted in FIG. 2) may be performed by one or more processing circuits.For example, the software components of the BMI system 100 may be hostedon a server including a processing circuit, and each user and the hostmay use a user interface (e.g., in a web browser) displayed by acomputer including a processing circuit. The server may perform thetransformation of the users' neural data into the commonrepresentational space, may apply the state representation to inform theActor RNN policy, may apply the online reward to the Critic model,and/or may generate and evaluate potential control signals utilizing thesequence decoder, as discussed above.

Various portions of embodiments of the present invention that refer tothe use of a “processing circuit” may be implemented with logic gates,or with any other embodiment of a processing unit or processing circuit.The term “processing unit” or “processing circuit” is used herein toinclude any combination of hardware, firmware, and software, employed toprocess data or digital signals. Processing unit hardware may include,for example, application specific integrated circuits (ASICs), generalpurpose or special purpose central processing units (CPUs), digitalsignal processors (DSPs), graphics processing units (GPUs), andprogrammable logic devices such as field programmable gate arrays(FPGAs). In a processing unit or a processing circuit, as used herein,each function is performed either by hardware configured, i.e.,hard-wired, to perform that function, or by more general purposehardware, such as a CPU, configured to execute instructions stored in anon-transitory storage medium. A processing unit or a processing circuitmay be fabricated on a single printed circuit board (PCB) or distributedover several interconnected PCBs. A processing unit or a processingcircuit may contain other processing units or circuits; for example aprocessing circuit may include two processing circuits, an FPGA and aCPU, interconnected on a PCB.

FIG. 4 is a block diagram illustrating various computerized systemscommunicating with one another which may be used to implementembodiments of the present invention.

As shown in FIG. 4, a system 500 according to some embodiments of thepresent disclosure connects with servers 501 (e.g., device to beoperated) to perform the operations described herein, such astransforming the users' neural data into the common representationalspace, applying the state representation to inform the Actor RNN policy,applying the online reward to the Critic model, and/or generating andevaluating potential control signals utilizing the sequence decoder.

The system 500 connects via a network 502 to the servers 501 to sendand/or receive information relating to the neural data and/or controlsignal derived therefrom of various user accounts (element 504) that maybe accessed via mobile and non-mobile devices, non-limiting examples ofwhich include desktop computers 506, laptop computers 508, smartphones510, and other mobile devices 512. As can be appreciated by one skilledin the art, the user device is any device that can receive and transmitdata (e.g., the neural data) via the network 502.

FIG. 5 is a block diagram illustrating a processing system, a processingcircuit, or a portion of a processing system or processing circuit,referred to herein as a computer system, used in conjunction with atleast one embodiment of the present invention.

An exemplary computer system 600 in accordance with an embodiment isshown in FIG. 5. Exemplary computer system 600 is configured to performcalculations, processes, operations, and/or functions associated with aprogram or algorithm. In one embodiment, certain processes and stepsdiscussed herein are realized as a series of instructions (e.g.,software program) that reside within computer readable memory units andare executed by one or more processing circuits of exemplary computersystem 600. When executed, the instructions cause exemplary computersystem 600 to perform specific actions and exhibit specific behavior,such as described herein.

Exemplary computer system 600 may include an address/data bus 610 thatis configured to communicate information. Additionally, one or more dataprocessing units, such as processing circuit 620, are coupled withaddress/data bus 610. Processing circuit 620 is configured to processinformation and instructions. In an embodiment, processing circuit 620is a microprocessor. Alternatively, processing circuit 620 may be adifferent type of processor such as a parallel processor, or a fieldprogrammable gate array.

Exemplary computer system 600 is configured to utilize one or more datastorage units. Exemplary computer system 600 may include a volatilememory unit 630 (e.g., random access memory (“RAM”), static RAM, dynamicRAM, etc.) coupled with address/data bus 610, wherein volatile memoryunit 630 is configured to store information and instructions forprocessing circuit 620. Exemplary computer system 600 further mayinclude a non-volatile memory unit 640 (e.g., read-only memory (“ROM”),programmable ROM (“PROM”), erasable programmable ROM (“EPROM”),electrically erasable programmable ROM “EEPROM”), flash memory, etc.)coupled with address/data bus 610, wherein non-volatile memory unit 640is configured to store static information and instructions forprocessing circuit 620. Alternatively exemplary computer system 600 mayexecute instructions retrieved from an online data storage unit such asin “Cloud” computing. In an embodiment, exemplary computer system 600also may include one or more interfaces, such as interface 650, coupledwith address/data bus 610. The one or more interfaces are configured toenable exemplary computer system 600 to interface with other electronicdevices and computer systems. The communication interfaces implementedby the one or more interfaces may include wireline (e.g., serial cables,modems, network adaptors, etc.) and/or wireless (e.g., wireless modems,wireless network adaptors, etc.) communication technology.

In one embodiment, exemplar computer system 600 may include an inputdevice 660 coupled with address/data bus 610, wherein input device 660is configured to communicate information (e.g., neural data) toprocessing circuit 620. In accordance with one embodiment, input device660 is one or more non-invasive sensors (e.g., sensors placed outsidethe user's skull) or invasive sensors (e.g., sensors surgicallyimplanted inside the brain), such as an invasive electrocorticographic(ECoG) device or an intracranial electroencephalography (iEEG) device(e.g., Dorsal M1 and ventral sensorimotor (M1+S1) electrodes).Alternatively, input device 660 may be an alphanumeric input device,such as a keyboard, that may include alphanumeric and/or function keys.In an embodiment, exemplar computer system 600 may include a cursorcontrol device 670 coupled with address/data bus 610, wherein cursorcontrol device 670 is configured to communicate user input informationand/or command selections to processing circuit 620. In an embodiment,cursor control device 670 is implemented using a device such as a mouse,a track-ball, a track-pad, an optical tracking device, or a touchscreen. The foregoing notwithstanding, in an embodiment, cursor controldevice 670 is directed and/or activated via input from input device 660,such as in response to the use of special keys and key sequence commandsassociated with input device 660. In an alternative embodiment, cursorcontrol device 670 is configured to be directed or guided by voicecommands.

In an embodiment, exemplary computer system 600 further may include oneor more optional computer usable data storage devices, such as storagedevice 680, coupled with address/data bus 610. Storage device 680 isconfigured to store information and/or computer executable instructions.In one embodiment, storage device 680 is a storage device such as amagnetic or optical disk drive (e.g., hard disk drive (“HDD”), floppydiskette, compact disk read only memory (“CD-ROM”), digital versatiledisk (“DVD”)). Pursuant to one embodiment, a display device 690 iscoupled with address/data bus 610, wherein display device 690 isconfigured to display video and/or graphics. In an embodiment, displaydevice 690 may include a cathode ray tube (“CRT”), liquid crystaldisplay (“LCD”), field emission display (“FED”), plasma display or anyother display device suitable for displaying video and/or graphic imagesand alphanumeric characters recognizable to a user.

Exemplary computer system 600 is presented herein as an exemplarycomputing environment in accordance with an embodiment. However,exemplary computer system 600 is not strictly limited to being acomputer system. For example, an embodiment provides that exemplarycomputer system 600 represents a type of data processing analysis thatmay be used in accordance with various embodiments described herein.Moreover, other computing systems may also be implemented. Indeed, thespirit and scope of the present technology is not limited to any singledata processing environment. Thus, in an embodiment, one or moreoperations of various embodiments of the present technology arecontrolled or implemented using computer-executable instructions, suchas program modules, being executed by a computer. In one exemplaryimplementation, such program modules include routines, programs,objects, components and/or data structures that are configured toperform particular tasks or implement particular abstract data types. Inaddition, an embodiment provides that one or more aspects of the presenttechnology are implemented by utilizing one or more distributedcomputing environments, such as where tasks are performed by remoteprocessing devices that are linked through a communications network, orsuch as where various program modules are located in both local andremote computer-storage media including memory-storage devices.

It should be understood that the drawings are not necessarily to scaleand that any one or more features of an embodiment may be incorporatedin addition to or in lieu of any one or more features in anotherembodiment, and the orientation of the components may have any othersuitable orientation in addition to the orientation depicted in thefigures. Moreover, the tasks described above may be performed in theorder described or in any other suitable sequence. Additionally, themethods described above are not limited to the tasks described. Instead,for each embodiment, one or more of the tasks described above may beabsent and/or additional tasks may be performed. As used herein, theterm “substantially,” “about,” “approximately,” and similar terms areused as terms of approximation and not as terms of degree, and areintended to account for the inherent deviations in measured orcalculated values that would be recognized by those of ordinary skill inthe art.

While this invention has been described in detail with particularreferences to exemplary embodiments thereof, the exemplary embodimentsdescribed herein are not intended to be exhaustive or to limit the scopeof the invention to the exact forms disclosed. Persons skilled in theart and technology to which this invention pertains will appreciate thatalterations and changes in the described structures and methods ofassembly and operation can be practiced without meaningfully departingfrom the principles, spirit, and scope of this invention, as set forthin the following claims, and equivalents thereof.

What is claimed is:
 1. A brain-machine interface system configured todecode neural signals to control a target device, the brain-machineinterface system comprising: at least one sensor configured to samplethe neural signals; and a computer-readable storage medium havingsoftware instructions stored therein, which, when executed by aprocessor, cause the processor to: transform the neural signals into acommon representational space stored in the brain-machine interfacesystem; provide the common representational space as a staterepresentation to inform an Actor recurrent neural network policy of thebrain-machine interface system; generate and evaluate, utilizing a deeprecurrent neural network of the brain-machine interface system having agenerative sequence decoder, predictive sequences of control signals forthe target device; supply a specific control signal derived from thepredictive sequences of control signals to the target device to achievean output of the target device; determine an intrinsic biometric-basedreward signal, from the common representational space, based on anexpectation of the output of the target device; and supply the intrinsicbiometric-based reward signal to a Critic model of the brain-machineinterface system.
 2. The brain-machine interface system of claim 1,wherein at least one sensor is configured to sample the neural signalsinvasively or non-invasively.
 3. The brain-machine interface system ofclaim 2, wherein at least one sensor comprises an invasiveelectrocorticographic (ECoG) device or an intracranialelectroencephalography (iEEG) device.
 4. The brain-machine interfacesystem of claim 1, wherein transforming the neural signals into thecommon representational space comprises: identifying regions of theneural signals with informative activations for controlling the targetdevice, and performing subject-specific transforms to align the regionsacross different users.
 5. The brain-machine interface system of claim1, wherein the intrinsic biometric-based reward is a positive emotionalresponse when the output of the target device matches a user's intendedoutput.
 6. The brain-machine interface system of claim 1, wherein theintrinsic biometric-based reward is a negative emotional response whenthe output of the target device does not match a user's intended output.7. The brain-machine interface system of claim 1, wherein the softwareinstructions, when executed by the processor, cause the processor togenerate and evaluate the predictive sequences of the control signalsutilizing a tree search.
 8. A non-transitory computer-readable storagemedium having software instructions stored therein, which, when executedby a processor, cause the processor to: transform neural data from anindividual user into a common representational space of a brain-machineinterface system; provide the common representational space as a staterepresentation to inform an Actor recurrent neural network policy of thebrain-machine interface system; generate and evaluate, utilizing a deeprecurrent neural network of the brain-machine interface system having agenerative sequence decoder, predictive sequences of control signals fora target device; supply a specific control signal derived from thepredictive sequences of control signals to the target device to producean output of the target device; determine an intrinsic biometric-basedreward, from the common representational space, based on the individualuser's expectation of the output of the target device; and supply theintrinsic biometric-based reward to a Critic model of the brain-machineinterface system.
 9. The non-transitory computer-readable storage mediumof claim 8, wherein the instructions, when executed by a processor,cause the processor to transform the neural data into the commonrepresentational space by: identifying regions of the neural data withinformative activations for controlling the target device, andperforming subject-specific transforms to align the regions acrossdifferent users.
 10. The non-transitory computer-readable storage mediumof claim 8, wherein the instructions, when executed by a processor,cause the processor to determine the intrinsic biometric-based reward bydecoding an emotional response from the individual user to the output ofthe target device.
 11. The non-transitory computer-readable storagemedium of claim 10, wherein the intrinsic biometric-based reward is apositive emotional response when the output of the target device matchesthe individual user's intended output.
 12. The non-transitorycomputer-readable storage medium of claim 10, wherein the intrinsicbiometric-based reward is a negative emotional response when the outputof the target device does not match the individual user's intendedoutput.
 13. The non-transitory computer-readable storage medium of claim8, wherein the software instructions, when executed by the processor,cause the processor to generate and evaluate the predictive sequences ofthe control signals utilizing a tree search.
 14. A method of controllinga target device utilizing neural data, the method comprising: samplingthe neural data from a user; transforming the neural data into a commonrepresentational space of a brain-machine interface system; supplyingthe common representational space as a state representation to inform anActor recurrent neural network policy of the brain-machine interfacesystem; generating and evaluating, utilizing a deep recurrent neuralnetwork of the brain-machine interface system having a generativesequence decoder, predictive sequences of control signals for the targetdevice; supplying a specific control signal derived from the predictivesequences of control signals to the target device to produce an outputof the target device; determining an intrinsic reward, from the commonrepresentational space, based on the user's expectation of the output ofthe target device; and supplying the intrinsic reward to a Critic modelof the brain-machine interface system.
 15. The method of claim 14,wherein the evaluating the predictive sequences of the control signalscomprises a tree search.
 16. The method of claim 14, wherein thetransforming the neural data into the common representational spacecomprises: identifying regions of the neural data with informativeactivations for controlling the target device, and performingsubject-specific transforms to align the regions across different users.17. The method of claim 14, wherein the determining the intrinsic rewardcomprises decoding an emotional response from the user to the output ofthe target device.
 18. The method of claim 17, wherein the emotionalresponse is a positive emotional response when the output of the targetdevice is expected.
 19. The method of claim 17, wherein the emotionalresponse is a negative emotional response when the output of the targetdevice is unexpected.
 20. The method of claim 14, wherein the samplingthe neural data from the user is performed invasively or non-invasively.